Efficient marginalization to compute protein posterior probabilities from shotgun mass spectrometry data.
نویسندگان
چکیده
The problem of identifying proteins from a shotgun proteomics experiment has not been definitively solved. Identifying the proteins in a sample requires ranking them, ideally with interpretable scores. In particular, "degenerate" peptides, which map to multiple proteins, have made such a ranking difficult to compute. The problem of computing posterior probabilities for the proteins, which can be interpreted as confidence in a protein's presence, has been especially daunting. Previous approaches have either ignored the peptide degeneracy problem completely, addressed it by computing a heuristic set of proteins or heuristic posterior probabilities, or estimated the posterior probabilities with sampling methods. We present a probabilistic model for protein identification in tandem mass spectrometry that recognizes peptide degeneracy. We then introduce graph-transforming algorithms that facilitate efficient computation of protein probabilities, even for large data sets. We evaluate our identification procedure on five different well-characterized data sets and demonstrate our ability to efficiently compute high-quality protein posteriors.
منابع مشابه
Integrating shotgun proteomics and mRNA expression data to improve protein identification
MOTIVATION Tandem mass spectrometry (MS/MS) offers fast and reliable characterization of complex protein mixtures, but suffers from low sensitivity in protein identification. In a typical shotgun proteomics experiment, it is assumed that all proteins are equally likely to be present. However, there is often other information available, e.g. the probability of a protein's presence is likely to c...
متن کاملA bayesian mixture model for comparative spectral count data in shotgun proteomics.
Recent developments in mass-spectrometry-based shotgun proteomics, especially methods using spectral counting, have enabled large-scale identification and differential profiling of complex proteomes. Most such proteomic studies are interested in identifying proteins, the abundance of which is different under various conditions. Several quantitative methods have recently been proposed and implem...
متن کاملiProphet: multi-level integrative analysis of shotgun proteomic data improves peptide and protein identification rates and error estimates.
The combination of tandem mass spectrometry and sequence database searching is the method of choice for the identification of peptides and the mapping of proteomes. Over the last several years, the volume of data generated in proteomic studies has increased dramatically, which challenges the computational approaches previously developed for these data. Furthermore, a multitude of search engines...
متن کاملMSblender: A probabilistic approach for integrating peptide identifications from multiple database search engines.
Shotgun proteomics using mass spectrometry is a powerful method for protein identification but suffers limited sensitivity in complex samples. Integrating peptide identifications from multiple database search engines is a promising strategy to increase the number of peptide identifications and reduce the volume of unassigned tandem mass spectra. Existing methods pool statistical significance sc...
متن کاملShotgun proteomics as a viable approach for biological discovery in the Pacific oyster
Shotgun proteomics offers an efficient means to characterize proteins in a complex mixture, particularly when sufficient genomic resources are available. In order to assess the practical application of shotgun proteomics in the Pacific oyster (Crassostrea gigas), liquid chromatography coupled with tandem mass spectrometry was used to characterize the gill proteome. Using information from the re...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of proteome research
دوره 9 10 شماره
صفحات -
تاریخ انتشار 2010